Predicting F0 and voicing from NAM-captured whispered speech

نویسندگان

Viet-Anh Tran

Gérard Bailly

Hélène Loevenbruck

Tomoki Toda

چکیده

The NAM-to-speech conversion proposed by Toda and colleagues which converts Non-Audible Murmur (NAM) to audible speech by statistical mapping trained using aligned corpora is a very promising technique, but its performance is still insufficient, mainly due to the difficulty in estimating F0 of the transformed voice from unvoiced speech. In this paper, we propose a method to improve F0 estimation and voicing decision in a NAM-to-speech conversion system based on Gaussian Mixture Models (GMM) applied to whispered speech. Instead of combining voicing decision and F0 estimation in a single GMM, a simple feed-forward neural network is used to detect voiced segments in the whisper while a GMM estimates a continuous melodic contour based on training voiced segments. The error rate for the voiced/unvoiced decision of the network is 6.8% compared to 9.2% with the original system. Our proposal benefits also to F0 estimation error.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accepted Manuscript Improvement to a Nam-captured Whisper-to-speech System Improvement to a Nam-captured Whisper-to-speech System

Exploiting a tissue-conductive sensor – a stethoscopic microphone – the system developed at NAIST which converts Non-Audible Murmur (NAM) to audible speech by GMM-based statistical mapping is a very promising technique. The quality of the converted speech is however still insufficient for computer-mediated communication, notably because of the poor estimation of F0 from unvoiced speech and beca...

متن کامل

Improvement to a NAM captured whisper-to-speech system

متن کامل

Perception of Tone in Whispered Mandarin Sentences: The Case for Singapore Mandarin

Whispering is commonly used when one needs to speak softly (for instance, in a library). Whispered speech mainly differs from neutral speech in that voicing, and thus its acoustic correlate F0, is absent. It is well known that in tonal languages such as Mandarin, tone identity is primarily conveyed by the F0 contour. Previous works also suggest that secondary correlates are both consistent and ...

متن کامل

Voicing assimilation in whispered speech

A large body of literature has shown that phonemic voicing contrasts are preserved in the production and perception of whispered speech. Nevertheless, it is unclear to what extent allophonic voicing is also maintained in whisper. The present study investigates whether a non-contrastive voicing distinction in Spanish fricatives – which results from voice assimilation in obstruent clusters – is a...

متن کامل

Aerodynamic and durational cues of phonological voicing in whisper

This paper presents analyses on the phonological voicing contrast in whispered speech, which is characterized by the absence of vocal fold vibrations. In modal speech, besides glottal vibration, the contrast between voiced and unvoiced consonants is realized by other phonetic correlates: e.g. consonant and pre-consonantal vowel durations, intraoral pressure differences. The analysis of these vo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Predicting F0 and voicing from NAM-captured whispered speech

نویسندگان

چکیده

منابع مشابه

Accepted Manuscript Improvement to a Nam-captured Whisper-to-speech System Improvement to a Nam-captured Whisper-to-speech System

Improvement to a NAM captured whisper-to-speech system

Perception of Tone in Whispered Mandarin Sentences: The Case for Singapore Mandarin

Voicing assimilation in whispered speech

Aerodynamic and durational cues of phonological voicing in whisper

عنوان ژورنال:

اشتراک گذاری